Skip to content

feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API#3423

Merged
amir-deris merged 14 commits into
mainfrom
amir/plt-329-migrate-x-evm-metrics-to-otel
May 14, 2026
Merged

feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API#3423
amir-deris merged 14 commits into
mainfrom
amir/plt-329-migrate-x-evm-metrics-to-otel

Conversation

@amir-deris
Copy link
Copy Markdown
Contributor

@amir-deris amir-deris commented May 12, 2026

Summary

Migrates telemetry in the x/evm/keeper and x/evm/ante packages from the legacy telemetry/utilmetrics helpers to the standardized OpenTelemetry Meter API, following the same pattern established in #3396 for app and app/ante.

  • Adds x/evm/keeper/metrics.go with a struct-based OTel instrument set registered via otel.Meter("evm_keeper"), initialized once via InitEvmKeeperMetrics() called from app.go after SetupOtelMetricsProvider()
  • Adds x/evm/ante/metrics.go with an evmAnteMetrics struct via otel.Meter("evm_ante"), initialized once via InitEvmAnteMetrics()
  • Replaces telemetry.ModuleMeasureSince / utilmetrics / seimetrics.SafeTelemetry* calls in all target files with dual-emitted OTel instruments; legacy calls remain with TODO(PLT-330) markers until dashboards are migrated
  • Threads ctx.Context() / goCtx through all .Record() and .Add() call sites
  • Disambiguates the evm_association_error_total counter that was registered by both keeper and ante into evm_keeper_association_error_total and evm_ante_association_error_total

New metrics (OTel naming convention, exported via the process-wide MeterProvider)

ABCI phase durations — histograms bucketed at fine-grained ms thresholds:

  • evm_abci_begin_blocker_duration_seconds
  • evm_abci_end_blocker_duration_seconds

Block fee:

  • evm_block_base_fee — synchronous gauge set each EndBlock

EVM transaction counters:

  • evm_panics_total
  • evm_errors_total — by type label (state_transition, stateDB_finalize, write_receipt, apply_message, vm_execution)
  • evm_receipt_status_total — by status label (success, failed)

Association errors:

  • evm_keeper_association_error_total — by scenario and type labels (emitted from keeper)
  • evm_ante_association_error_total — by scenario and type labels (emitted from ante)

Zero-storage cleanup:

  • evm_zero_storage_processed_keys_total
  • evm_zero_storage_pruned_keys_total
  • evm_zero_storage_pruned_bytes_total

Ante nonce/fee tracking:

  • evm_pending_nonce_total — by event label (added, expired, rejected, accepted)
  • evm_nonce_mismatch_total — by too_high bool attribute (true = nonce above expected, false = nonce below expected)
  • evm_effective_gas_price — histogram of effective gas price per EVM transaction

Note

Medium Risk
Touches EVM ante/keeper hot paths (BeginBlock/EndBlock, tx processing, nonce/fee checks) to add new OTel metric emissions; risk is mainly around performance/label correctness and potential numeric conversions for gauges/histograms.


Note

Medium Risk
Touches EVM ante/keeper hot paths (tx processing and BeginBlock/EndBlock) to add new OpenTelemetry metric emissions alongside legacy metrics; risk is mainly performance overhead and correctness of labels/value conversions (big.Int→float64).

Overview
Adds new OpenTelemetry Meter instrument sets for x/evm/ante and x/evm/keeper (new metrics.go files) and wires them into EVM hot paths.

Keeper now records BeginBlock/EndBlock duration histograms, base-fee gauge values (with safe big.Intfloat64 conversion), tx panic/error/receipt-status counters, association-error counters, and zero-storage cleanup counters; ante now records effective gas price histograms plus nonce-mismatch and association-error counters.

Legacy utils/metrics/SafeTelemetry* calls remain in place but are renamed/aliased and annotated with TODO(PLT-330) while dashboards migrate.

Reviewed by Cursor Bugbot for commit ba7a653. Bugbot is set up for automated code reviews on this repo. Configure here.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 12, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMay 14, 2026, 6:23 PM

@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 69.01408% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.23%. Comparing base (0f16b6d) to head (ba7a653).

Files with missing lines Patch % Lines
x/evm/keeper/metrics.go 38.46% 6 Missing and 2 partials ⚠️
x/evm/ante/fee.go 45.45% 5 Missing and 1 partial ⚠️
x/evm/keeper/msg_server.go 62.50% 6 Missing ⚠️
x/evm/ante/metrics.go 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3423   +/-   ##
=======================================
  Coverage   59.23%   59.23%           
=======================================
  Files        2116     2118    +2     
  Lines      175440   175463   +23     
=======================================
+ Hits       103916   103932   +16     
- Misses      62460    62463    +3     
- Partials     9064     9068    +4     
Flag Coverage Δ
sei-chain-pr 72.53% <69.01%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
x/evm/ante/preprocess.go 80.23% <100.00%> (+0.23%) ⬆️
x/evm/ante/sig.go 83.78% <100.00%> (+0.92%) ⬆️
x/evm/keeper/abci.go 59.32% <100.00%> (+3.76%) ⬆️
x/evm/keeper/evm.go 74.81% <100.00%> (+0.18%) ⬆️
x/evm/keeper/storage_cleanup.go 89.39% <100.00%> (+0.50%) ⬆️
x/evm/ante/metrics.go 50.00% <50.00%> (ø)
x/evm/ante/fee.go 68.49% <45.45%> (-4.53%) ⬇️
x/evm/keeper/msg_server.go 80.36% <62.50%> (+1.94%) ⬆️
x/evm/keeper/metrics.go 38.46% <38.46%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread x/evm/keeper/metrics.go Outdated
@amir-deris amir-deris changed the title Amir/plt 329 migrate x evm metrics to otel feat(x/evm): migrate x/evm metrics to OpenTelemetry Meter API May 12, 2026
@amir-deris amir-deris self-assigned this May 12, 2026
Comment thread Makefile Outdated
else \
DETACH_FLAG=""; \
fi; \
DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grafana and Prometheus were adding a lot of noise to the logs so added --no-attach flag. Their logs can still be viewed with docker logs sei-grafana and docker logs sei-prometheus

@amir-deris amir-deris requested a review from bdchatham May 12, 2026 22:32
Comment thread x/evm/keeper/metrics.go Outdated
Comment thread x/evm/ante/fee.go Outdated
@amir-deris amir-deris requested a review from masih May 12, 2026 22:32
@amir-deris amir-deris changed the base branch from main to amir/plt-323-migrate-app-metrics-to-otel-meter-api May 12, 2026 22:34
evaluation_interval: 15s

scrape_configs:
- job_name: 'cryptosim'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rollback these changes to existing prometheus and grafana yaml files and create new configs.

Comment thread x/evm/keeper/metrics.go Outdated
Comment on lines +1 to +11
apiVersion: 1
providers:
- name: default
orgId: 1
folder: ""
type: file
disableDeletion: false
updateIntervalSeconds: 30
options:
path: /var/lib/grafana/dashboards
foldersFromFilesStructure: false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have context on these dashboards here. Are these to spin up dashboards for the docker compose personal stack? Seems like yes but want to confirm.

We're creating a mirror in the platform deployment, is that right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually these are for local (docker) environment. So the idea is having a new target in MakeFile so we can run make docker-cluster-start-monitoring locally with docker compose and have a prometheus and grafana local containers and Grafana user interface to investigate metrics, do performance measurement of our changes without having to release them.
We currently don't have a clean setup locally for this. We can run scripts like ./docker/monitornode/scripts/start-prometheus.sh but their ports conflict with existing setup and we need to shut them down manually.

@@ -0,0 +1,8 @@
apiVersion: 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of docker compose pieces here. I'm slightly leaning towards us not creating these and just supporting them on the new platform Grafana. Seems like complexity & scope that isn't worth it's weight to me although I like that we are thinking about the tooling end of this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback. I think these help with testing locally and iterating quickly for measuring performance changes against a metric. If that is ok, I would rather keep it 😊

Comment thread x/evm/ante/fee.go Outdated
Comment thread x/evm/ante/sig.go Outdated
// this nonce has already been mined, we cannot accept it again
metrics.IncrementPendingNonce("rejected")
utilmetrics.IncrementPendingNonce("rejected") // TODO(PLT-330): remove once evm_pending_nonce_total verified
evmAnteMetrics.pendingNonce.Add(ctx.Context(), 1, otelmetric.WithAttributes(attribute.String("event", "rejected")))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you could define the attributes once and reuse them since they are the same each time. Prevents creating a now struct each time to represent this and just reuses an immutable instance.

Should apply to all of these that have a known set of possible values ahead of time.

Comment thread x/evm/ante/sig.go Outdated
Comment on lines +131 to +133
cause := "too_low"
if tooHigh {
cause = "too_high"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could simplify to "lower" and "higher" just too simplify use for dashboards or tools.

Comment thread x/evm/keeper/metrics.go Outdated
Comment thread x/evm/ante/metrics.go Outdated
Comment on lines +26 to +30
func mustAnteMetric[V any](v V, err error) V {
if err != nil {
panic(err)
}
return v
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you're doing this but in different ways across packages. Consider consolidating into a single place for all your metrics structs to reuse.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback. Per discussion with @masih , it seems in GO the emphasis is on packages being independent and not have coupling for code reuse.

Comment thread Makefile Outdated
Comment on lines +321 to +338
# Start 4-node cluster with Prometheus and Grafana monitoring
docker-cluster-start-monitoring: docker-cluster-stop build-docker-node
@rm -rf $(PROJECT_HOME)/build/generated
@mkdir -p $(shell go env GOPATH)/pkg/mod
@mkdir -p $(shell go env GOCACHE)
@cd docker && \
if [ "$${DOCKER_DETACH:-}" = "true" ]; then \
DETACH_FLAG="-d"; \
else \
DETACH_FLAG=""; \
fi; \
DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) NUM_ACCOUNTS=10 INVARIANT_CHECK_INTERVAL=${INVARIANT_CHECK_INTERVAL} UPGRADE_VERSION_LIST=${UPGRADE_VERSION_LIST} MOCK_BALANCES=${MOCK_BALANCES} GIGA_EXECUTOR=${GIGA_EXECUTOR} GIGA_OCC=${GIGA_OCC} RECEIPT_BACKEND=${RECEIPT_BACKEND} AUTOBAHN=${AUTOBAHN} GIGA_STORAGE=${GIGA_STORAGE} docker compose -f docker-compose.yml -f docker-compose.monitoring.yml up --no-attach grafana --no-attach prometheus $$DETACH_FLAG
.PHONY: docker-cluster-start-monitoring

# Stop monitoring containers (Prometheus and Grafana) and cluster
docker-cluster-stop-monitoring:
@cd docker && DOCKER_PLATFORM=$(DOCKER_PLATFORM) USERID=$(shell id -u) GROUPID=$(shell id -g) GOCACHE=$(shell go env GOCACHE) docker compose -f docker-compose.yml -f docker-compose.monitoring.yml down
.PHONY: docker-cluster-stop-monitoring
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any issues with testing this in your Harbour personal stack? Let me know if I can help. Ideally it's easy enough that you can use that instead of rolling net-new infra like this to test your changes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this local version could allow faster iterations and experiments, to make sure there is value in the optimization example being made. Once we confirm, we can move to Harbour and test that modification in a more prod like environment. It also can lower our AWS cost to not take it to Harbour when there is no observed improvement.

@amir-deris amir-deris force-pushed the amir/plt-329-migrate-x-evm-metrics-to-otel branch from 96669ca to 6eb1415 Compare May 13, 2026 16:15
Comment thread Makefile Outdated
@amir-deris amir-deris force-pushed the amir/plt-329-migrate-x-evm-metrics-to-otel branch from 6eb1415 to f9e0114 Compare May 13, 2026 18:04
Comment thread x/evm/keeper/metrics.go Outdated
Comment thread x/evm/ante/metrics.go
@amir-deris amir-deris force-pushed the amir/plt-329-migrate-x-evm-metrics-to-otel branch 2 times, most recently from 2d4f018 to d2c9c1b Compare May 13, 2026 22:55
Comment thread x/evm/keeper/abci.go Outdated
Copy link
Copy Markdown
Collaborator

@masih masih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of recommendations. This PR is still a PR to a non-main branch. Could we have this based against main please for the final review?

Thanks Amir!

@@ -0,0 +1,8 @@
apiVersion: 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amir-deris could you move the docker grafana stuff to a separate PR please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sir! will do that.

Comment thread x/evm/ante/sig.go Outdated
utilmetrics.IncrementNonceMismatch(tooHigh) // TODO(PLT-330): remove once evm_nonce_mismatch_total verified
cause := "lower"
if tooHigh {
cause = "higher"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn tooHigh into a bool attribute instead if stringify-ing it?

Base automatically changed from amir/plt-323-migrate-app-metrics-to-otel-meter-api to main May 14, 2026 10:18
@amir-deris amir-deris force-pushed the amir/plt-329-migrate-x-evm-metrics-to-otel branch from 4b6883f to a5c2a10 Compare May 14, 2026 15:07
Comment thread x/evm/ante/sig.go Outdated
Comment thread x/evm/keeper/metrics.go
Comment thread x/evm/keeper/metrics.go
Comment thread x/evm/ante/fee.go
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ba7a653. Configure here.

Comment thread x/evm/ante/metrics.go
"evm_pending_nonce",
metric.WithDescription("EVM pending nonce events by type (added, expired, rejected, accepted)"),
metric.WithUnit("{count}"),
)),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused pendingNonce metric registered but never emitted

Low Severity

The pendingNonce counter is registered in evmAnteMetrics (allocating OTel resources at init) but evmAnteMetrics.pendingNonce is never referenced anywhere in the codebase. The corresponding legacy function IncrementPendingNonce in utils/metrics is also never called. This is dead code that registers a metric instrument at startup for no purpose. The same pattern exists in app/ante/metrics.go with app_pending_nonce, which is also unused.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ba7a653. Configure here.

@amir-deris amir-deris enabled auto-merge May 14, 2026 18:29
@amir-deris amir-deris added this pull request to the merge queue May 14, 2026
Merged via the queue into main with commit d79122f May 14, 2026
42 checks passed
@amir-deris amir-deris deleted the amir/plt-329-migrate-x-evm-metrics-to-otel branch May 14, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants